Optimizing Many-Threads-to-Many-Cores Mapping in Parallel Electronic System Level Simulation
نویسنده
چکیده
OF THE DISSERTATION Optimizing Many-Threads-to-Many-Cores Mapping in Parallel Electronic System Level Simulation By Guantao Liu Doctor of Philosophy in Computer Engineering University of California, Irvine, 2017 Professor Rainer Dömer, Chair In hardware/software codesign, Discrete Event Simulation (DES) has been in use for decades to verify and validate the functionality of Electronic System Level (ESL) models. Since the parallel computing platforms are readily available today, many Parallel Discrete Event Simulation (PDES) approaches are proposed to improve the simulation performance. However, as the thread parallelism increases in ESL designs and core count multiplies on multi-core and many-core platforms, thread-to-core mapping becomes critical in PDES. In this dissertation, we propose a computationand communication-aware approach to optimize thread mapping for parallel ESL simulation, with the aims of load balancing and communication minimization. As we identify that the order of dispatching parallel threads has a significant influence on the total simulation time, and Longest Job First (LJF) shows better performance than the Linux default thread dispatch policy, we first propose a segmentaware LJF scheduler for PDES. Our segment-aware scheduler can accurately predict the run time of the thread segments ahead, and thus make better dispatching decisions. Next, we define the concept of core distance for multi-core and many-core architectures, which quantifies core-to-core communication latency and characterizes processor hierarchies. For many-core architectures using directory-based cache coherence protocols, we observe that
منابع مشابه
Efficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems
Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or many-threading platforms is a challenge due to the difficulty of schedulation of hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which running some depe...
متن کاملClustering Cores for Parallel Thread Execution
In recent years, we have observed a strong trend towards using accelerators, such as GPUs, to speed up scientific applications. This results in a complex heterogeneous system in which traditional CPUs are used for the execution of sequential threads, while GPUs are used for accelerating parallel threads. Instead of following this trend, this paper introduces a new explicitly parallel instructio...
متن کاملEffective cooperative scheduling of task-parallel applications on multiprogrammed parallel architectures
Emerging architecture designs include tens of processing cores on a single chip die; it is believed that the number of cores will reach the hundreds in not so many years from now. However, most common parallel workloads cannot fully utilize such systems. They expose fluctuating parallelism, and do not scale up indefinitely as there is usually a point after which synchronization costs outweigh t...
متن کاملMulticore Performance Optimization Using Partner Cores
As the push for parallelism continues to increase the number of cores on a chip, and add to the complexity of system design, the task of optimizing performance at the application level becomes nearly impossible for the programmer. Much effort has been spent on developing techniques for optimizing performance at runtime, but many techniques for modern processors employ the use of speculative thr...
متن کاملParaWeaver: Performance Evaluation on Programming Models for Fine Grained Threads
There is a trend towards multicore or manycore processors in computer architecture design. In addition, several parallel programming models have been introduced. Some extract concurrent threads implicitly whenever possible, resulting in fine grained threads. Others construct threads by explicit user specifications in the program, resulting in coarse grained threads. How these two mechanisms imp...
متن کامل